Detecting Comment Spam through Content Analysis
نویسندگان
چکیده
In the Web 2.0 eras, the individual Internet users can also act as information providers, releasing information or making comments conveniently. However, some participants may spread irresponsible remarks or express irrelevant comments for commercial interests. This kind of socalled comment spam severely hurts the information quality. This paper tries to automatically detect comment spam through content analysis, using some previously-undescribed features. Experiments on a real data set show that our combined heuristics can correctly identify comment spam with high precision(90.4%) and recall(84.5%).
منابع مشابه
Poster: Effort-based Detection of Comment Spammers
Social media has become ubiquitous and important for content sharing. A typical example of how users contribute content to a social media platform is through comment threads in online articles. Unfortunately, there is an increasing prevalence of malicious activity in these threads by spammers through comment messages. The existing approaches tackling comment spam are comment-level in that they ...
متن کاملNEIGHBORWATCHER: A Content-Agnostic Comment Spam Inference System
Comment spam has become a popular means for spammers to attract direct visits to target websites, or to manipulate search ranks of the target websites. Through posting a small number of spam messages on each victim website (e.g., normal websites such as forums, wikis, guestbooks, and blogs, which we term as spam harbors in this paper) but spamming on a large variety of harbors, spammers can not...
متن کاملDetecting Content Spam on the Web through Text Diversity Analysis
Web spam is considered to be one of the greatest threats to modern search engines. Spammers use a wide range of content generation techniques known as content spam to fill search results with low quality pages. We argue that content spam must be tackled using a wide range of content quality features. In this paper we propose a set of content diversity features based on frequency rank distributi...
متن کاملA Self-Supervised Approach to Comment Spam Detection Based on Content Analysis
This paper studies the problems and threats posed by a type of spam in the blogosphere, called blog comment spam. It explores the challenges introduced by comment spam, generalizing the analysis substantially to any other short text type spam. The authors analyze different high-level features of spam and legitimate comments based on the content of blog postings. The authors use these features t...
متن کاملLibrary blogs and user participation: a survey about comment spam in library blogs
Purpose The purpose of this research is to identify and describe the impact of comment spam in library blogs. Three research questions guided the study: current level of commenting in library blogs; librarians' perception of comment spam; and techniques used to address the comment spam problem. Design/methodology/approach A quantitative approach is used to investigate research questions. Inform...
متن کامل